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DETAILED ACTION 

Claim Rejections - 35 USC §102 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the 
basis for the rejections under this section made in this Office action: 
A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under section 122(b), by another filed 
in the United States before the invention by the applicant for patent or (2) a patent granted on an application for 
patent by another filed in the United States before the invention by the applicant for patent, except that an 
international application filed under the treaty defined in section 351(a) shall have the effects for purposes of this 
subsection of an application filed in the United States only if the international application designated the United 
States and was published under Article 21(2) of such treaty in the English language. 

1. Claim 28 is rejected under 35 U.S.C, 102(e) as being anticipated by Alleva et al. (US 
Patent No. 5,970,449). 

2. Regarding claim 28, Alleva et al teaches a system for text normalization in which the 
output of a speech recognizer is processed to produce a representation of the appropriate digits. 
Alleva describes the speech recognition processor that produces textual output corresponding to 
recognized portions of input speech, such that the recognizer produces text such as "ten cents" 
and "four o'clock in the afternoon", which reads on "a speech recognition processor that receives 
unconstrained input speech and outputs a string of words, the speech recognition processor being 
based on a nimaeric language that represents a subset of a vocabulary, the subset including a set 
of words identified as being relevant for interpreting and understanding number strings," since 
the words ten, cents, four and o'clock are the vocabulary words of numeric language that are 
relevant for interpreting and understanding number strings related to currency and time (col. 3, 
line 18 to col. 4, line 6; Abstract; Figure 1, element 32; Figure 9, element 132; col. 1, lines 56- 
62; col. 6, lines 14-17 and 40-42; col. 5, lines 62-65 and col. 6, lines 32-64); 
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At col. 6, lines 14-64, Alleva et al describes the rules the text normalizer (element 38, 
Figures 3 A-3E) implements to process the string of words received from the speech recognizer to 
generate a sequence of corresponding digits, which reads on "a numeric understanding processor 
containing classes of rules for converting the string of words into a sequence of digits." 



Claim Rejections - 35 USC § 103 
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

3. Claims 17-19, 21-27, 29-34, and 36 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Alleva et al (US Patent No. 5,970,449) in view of Sukkar (US Patent No. 
5,613,037). 

4. Regarding claim 17, Alleva at al teaches 

receiving a speech signal at col. 3, line 18 to col. 4, line 6; 

Alleva describes the speech recognition processor that produces textual output 
corresponding to recognized portions of input speech, such that the recognizer produces text such 
as "ten cents" and "four o'clock in the afternoon," which reads on "performing speech 
recognition process on the received speech signal to produce speech recognition results, wherein 
a numeric language includes a subset of a vocabulary, the subset of the vocabulary including 
words that identify digits in number strings and words that enable the interpretation and 



Application/Control Number: 09/3 14,637 Page 4 

Art Unit: 2654 

understanding of number strings," since the words ten, cents, four and o'clock are the 
vocabulary words of numeric language that are relevant for interpreting and imderstanding 
number strings related to currency and time (col. 3, line 18 to col. 4, line 6; Abstract; Figure 1, 
element 32; Figure 9, element 132; col. 1, lines 56-62; col. 6, lines 14-17 and 40-42; col. 5, lines 
62-65 and col. 6, lines 32-64); 

At col. 6, lines 14-64 and Figure 9, elements 122, 124, 126, 128, and 130, AUeva et al 
describes the rules the text normalizer (element 38, Figures 3A-3E) implements to process the 
string of words received from the speech recognizer to generate a sequence of corresponding 
digits, which reads on "generating a sequence of digits using said speech recognition results, said 
generating being based on a set of rules." 

AUeva fails to explicitly teach a system comprising acoustic models utilized by the 
speech recognition processor. However, implementation of acoustic models in a speech 
recognition system was well known in the art. 

In a similar field of endeavor, Sukkar discloses a speech recognition system comprising 
acoustic model, utilized by the speech recognition processor (Figure 3, element 308). 
Additionally, Sukkar teaches a digit model for digit recognition and a second model, a filler 
model, a generaUzed HMM model of spoken words that do not contain digits (col. 3, line 19 to 
col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to hnplement the acoustic model teachings of Sukkar in the recognition system of Alleva, for the 
purpose of accurately producing vector representations of the received input speech. 
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Regarding claim 18, Aileva teaches the speech recognition processor at Figure 1, element 
32; col. 3, line 18 to col. 4, line 6. 

Regarding claim 19, Aileva does not teach that the recognition process on a set of 
acoustical models that has been defined for other words in the vocabulary. 

Sukkar discloses a speech recognition system comprising acoustic model, utilized by the 
speech recognition processor (Figure 3, element 308). Additionally, Sukkar teaches a digit 
model for digit recognition and a second model, a filler model, a generalized HMM model of 
spoken words that do not contain digits (col. 3, line 19 to col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic model teachings of Sukkar in the recognition system of Aileva, for the 
purpose of accurately producing vector representations of the received input speech. 

Regarding claim 21, Aileva teaches the speech recognition processor that produces 
textual output corresponding to recognized portions of input speech, such that the recognizer 
produces text such as "ten cents," "April first nineteen ninety seven," "Seattle Washington nine 
eight zero five two" and "four o'clock in the afternoon," which reads on "numeric language 
includes digits, natural numbers, alphabets, and city/country name classes," since the words ten, 
cents, April, Seattle, Washington, four and o'clock are the vocabulary words of numeric 
language that are relevant for interpreting and understanding number strings related to classes of 
digits, natural nimibers, alphabets, and city/country name (col. 3, line 18 to col. 4, line 6; 
Abstract; Figure 1, element 32; Figure 9, element 132; col. 1, lines 56-62; col. 6, lines 14-17 and 
40-42; col. 5, lines 62-65 and col. 6, lines 32-64). 
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Alleva does not teach the numeric language includes a re-starts class. At col. 5, line 48- 
52, Sukkar discloses implementation of a misrecognition classifier, so as to account for the errors 
during recognition. 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement words in the numeric language related to recognition errors to account for errors 
during the recognition process, as suggested by Sukkar, for the purpose of providing reliable and 
accurate recognition and thereby improve system performance. 

Regarding claim 22, Alleva does not explicitly teach acoustic models are hidden Markov 
models. Sukkar discloses a speech recognition system comprising acoustic model, utilized by 
the speech recognition processor (Figure 3, element 308). Additionally, Sukkar teaches a digit 
model for digit recognition and a second model, a filler model, a generaHzed HMM model of 
spoken words that do not contain digits (col. 3, line 19 to col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic model teachings of Sukkar in the recognition system of Alleva, for the 
purpose of accurately producing vector representations of the received input speech. 

Regarding claim 23, Alleva does not teach a numeric recognition processor. Sukkar 
discloses a speech recognition system comprising acoustic model, utilized by the speech 
recognition processor (Figure 3, element 308). Additionally, Sukkar teaches a digit model for 
digit recognition and a second model, a filler model, a generalized HMM model of spoken words 
that do not contain digits (col. 3, line 19 to col. 4, line 22). * 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement a numeric recognition processor as taught by Sukkar in the recognition system of 
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Alleva, for the purpose of accurately producing vector representations of recognized numbers of 
the received input speech. 

Regarding claims 24 and 26-27, Alleva teaches a set of rules includes a naturals rule, a 
restarts rule, a city/country rule, and a alphabets rule at Figure 9, element 126 and col. 6, line 3 to 
col. 7, line 9. 

Regarding claim 25, Alleva does not teach the set of rules includes re-starts rules. At col. 
5, line 48-52, Sukkar discloses implementation of a misrecognition classifier, so as to account for 
the errors during recognition. 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement process or normalize the words in the numeric language output fi^om the speech 
recognizer that are related to recognition errors to account for errors during the recognition 
process, as suggested by Sukkar, for the purpose of providing reliable and accurate recognition 
and thereby improve system performance. 

Regarding claim 29, Alleva fails to explicitly teach a system comprising acoustic models 
utilized by the speech recognition processor. However, implementation of acoustic models in a 
speech recognition system was well known in the art. 

Sukkar discloses a speech recognition system comprising acoustic model, utilized by the 
speech recognition processor (Figure 3, element 308). Additionally, Sukkar teaches a digit 
model for digit recognition and a second model, a filler model, a generalized HMM model of 
spoken words that do not contain digits (col. 3, line 19 to col. 4, line 22). 
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Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic model teachings of Sukkar in the recognition system of Alleva, for the 
purpose of accurately producing vector representations of the received input speech. 

Regarding claim 30, Alleva fails to explicitly teach a first set of hidden Markov models 
that characterize acoustic features of words in the numeric language and a second set of hidden 
Markov models that characterize acoustic features of words in the remainder of the vocabulary. 

Sukkar teaches a digit model for digit recognition and a second model, a filler model, a 
generalized HMM model of spoken words that do not contain digits (col. 3, line 19 to col. 4, line 
22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic hidden Markov model teachings of Sukkar in the recognition system 
of Alleva, for the purpose of accurately producing vector representations of the received input 
speech. 

Regarding claim 31, Alleva fails to explicitly teach a filler model that characterizes out of 
vocabulary features. 

Sukkar teaches a digit model for digit recognition and a second model, a filler model, a 
generalized HMM model of spoken words that do not contain digits (col. 3, line 19 to col. 4, line 
22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic hidden Markov model teachings of a filler model, as suggested by 
Sukkar in the recognition system of Alleva, for the purpose of accurately producing vector 
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representations of the received input speech to accurately distinguish numeric input from other 
speech input. 

Regarding claim 32, Alleva fails to teach an utterance verification processor. At col. 5, 
lines 44-52, Sukkar describes a digit/non-digit classification that identifies speech containing 
valid digits, speech not containing a digit and speech containing misrecognitions. Sukkar 
teaches the misrecognitions are identified as non-digits so that errors can be rejected and not 
classified as vahd digit data. 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to modify the system of Alleva to implement utterance verification as taught by Sukkar, for the 
purpose of ensuring that only valid digit information is recognized and classified as actual digit 
speech. 

Regarding claim 33, Alleva does not teach a validation database or a string validation 
processor. At col. 7, lines 6-49, Sukkar describes candidate string validation based on individual 
candidate digit confidence scores that are determined using a digit vocabulary set of the digit 
models. Sukkar teaches the string validation is implemented so that errors in the string cause the 
string to be rejected, which is desirable for many applications. 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to modify the system of Alleva to implement string validation as taught by Sukkar, for the 
purpose of ensuring that only vaUd digit information is accepted and applications using the 
system process and operate with valid data. 
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Regarding claim 34, at col. 8, lines 28-38, AUeva teaches the normalizer normalizes the 
text and a speech API forwards the normalized content to a application program, which reads on 
"a dialogue manager processor that initiates an action based on the validity information." 

Regarding claim 36, AUeva teaches a set of rules includes a naturals rule, a restarts rule, a 
city/country rule, and a alphabets rule at Figure 9, element 126 and col. 6, line 3 to col. 7, line 9. 
AUeva does not teach the set of rules includes a re-starts rule. At col. 5, line 48-52, Sukkar 
discloses implementation of a misrecognition classifier, so as to account for the errors during 
recognition. 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement process or normalize the words in the numeric language output from the speech 
recognizer that are related to recognition errors to account for errors during the recognition 
process, as suggested by Sukkar, for the purpose of providing reliable and accurate recognition 
and thereby improve system performance. 

5. Claims 13-16, 20 and 35 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
AUeva et al (US Patent No. 5,970,449) in view of Sukkar (US Patent No. 5,613,037) and fiirther 
in view of Huang et al (US Patent No. 5,937,384). 

6. Regarding claim 13, AUeva describes the speech recognition processor that produces 
textual output corresponding to recognized portions of input speech, such that the recognizer 
produces text such as "ten cents" and "four o'clock in the afternoon," which reads on "a speech 
recognition method comprising, defining a numeric language, the numeric language including a 
subset of a vocabulary, the subset of the vocabulary including words that identify digits in 
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number strings and words that enable the interpretation and understanding of number strings," 
since the words ten, cents, four and o'clock are the vocabulary words of numeric language that 
are relevant for interpreting and understanding number strings related to currency and time (col. 
3, line 18 to col. 4, line 6; Abstract; Figure 1, element 32; Figure 9, element 132; col. 1, lines 56- 
62; col. 6, lines 14-17 and 40-42; col. 5, lines 62-65 and col. 6, lines 32-64); 

AUeva does not teach a set of acoustic models for the numeric language, a second set of 
acoustical models that has been defined for other words in the vocabulary or storing the first and 
second set of acoustical models in an acoustic model database. 

Sukkar discloses a speech recognition system comprising acoustic model, utilized by the 
speech recognition processor (Figure 3, element 308). Additionally, Sukkar teaches a digit 
model for digit recognition and a second model, a filler model, a generalized HMM model of 
spoken words that do not contain digits (col. 3, line 19 to col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic model teachings of Sukkar in the recognition system of AUeva, for the 
purpose of accurately producing vector representations of the received input speech. 

Alleva and Sukkar do not implement a first quality level for the first acoustic models and 
a second quahty level for the second acoustic models. Huang teaches a method and system for 
speech recognition using continuous density hidden Markov models, which implements context- 
dependent HMMs and context- independent HMMs and teaches that the use of both types of 
HMMs is beneficial in achieving an improved recognition accuracy (col. 6, lines 18-38). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to modify the system of Alleva and Sukkar to implement both context-dependent HMMs and 
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context-independent HMMs, as by Huang, for the purpose of achieving an improved recognition 
accuracy, as suggest by Huang. 

Regarding claim 14, Alleva teaches the speech recognition processor that produces 
textual output corresponding to recognized portions of input speech, such that the recognizer 
produces text such as "ten cents," "April first nineteen ninety seven," "Seattle Washington nine 
eight zero five two" and "four o'clock in the afternoon," which reads on "numeric language 
includes digits, natural numbers, alphabets, and city/country name classes," since the words ten, 
cents, April, Seattle, Washington, four and o'clock are the vocabulary words of numeric 
language that are relevant for interpreting and understanding number strings related to classes of 
digits, natural numbers, alphabets, and city/country name (col. 3, line 18 to col. 4, line 6; 
Abstract; Figure 1, element 32; Figure 9, element 132; col. 1, Unes 56-62; col. 6, lines 14-17 and 
40-42; col. 5, lines 62-65 and col. 6, lines 32-64). 

Alleva does not teach the numeric language includes a re-starts class. At col. 5, line 48- 
52, Sukkar discloses implementation of a misrecognition classifier, so as to account for the errors 
during recognition. 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement words in the numeric language related to recognition errors to account for errors 
during the recognition process, as suggested by Sukkar, for the purpose of providing reliable and 
accurate recognition and thereby improve system performance. 

Regarding claim 15, Alleva does not explicitly teach acoustic models are hidden Markov 
models. Sukkar discloses a speech recognition system comprising acoustic model, utilized by 
the speech recognition processor (Figure 3, element 308). Additionally, Sukkar teaches a digit 
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model for digit recognition and a second model, a filler model, a generalized HMM model of 
spoken words that do not contain digits (col. 3, line 19 to col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic model teachings of Sukkar in the recognition system of Alleva, for the 
purpose of accurately producing vector representations of the received input speech. 

Regarding claim 16, Alleva fails to explicitly teach a filler model that characterizes out of 
vocabulary features. Sukkar teaches a digit model for digit recognition and a second model, a 
filler model, a generalized HMM model of spoken words that do not contain digits (col. 3, line 
19 to coL 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic hidden Markov model teachings of a filler model, as suggested by 
Sukkar in the recognition system of Alleva, for the purpose of accurately producing vector 
representations of the received input speech to accurately distinguish numeric input from other 
speech input. 

Regarding claim 20, Alleva and Sukkar do not implement a first quality level for the first 
acoustic models and a second quality level for the second acoustic models. Huang teaches a 
method and system for speech recognition using continuous density hidden Markov models, 
which implements context-dependent HMMs and context-independent HMMs and teaches that 
the use of both types of HMMs is beneficial in achieving an improved recognition accuracy (col. 
6, lines 18-38). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to modify the system of Alleva and Sukkar to implement both context-dependent HMMs and 



Application/Control Number: 09/314,637 Page 14 

Art Unit: 2654 

context-independent HMMs, as by Huang, for the purpose of achieving an improved recognition 
accuracy, as suggest by Huang. 

Regarding claim 35, AUeva and Sukkar do not specifically teach a language model 
database that stores data describing the structure and sequence of words and phrases. Huang 
teaches a language model that represents linguistic expressions and describes the implementation 
of language model in predicting the likelihood of occurrence of a word considering the words 
that have been uttered (col. 14, lines 35-54) and teaches the system is beneficial in improving the 
recognition capability of a speech recognition system. 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to modify the system of Alleva and Sukkar to implement language models in predicting 
likelihoods of word occurrences, as taught by Huang, for the purpose of improving recognition 
capability of the speech recognizer. 



Response to Arguments 
7. Applicant's arguments filed June 25, 2004 have been fiiUy considered but they are not 
persuasive. 

Regarding claim 28, Applicant argues the Examiner blends the fimctions of the text 
normalizer with the fimction of the speech recognizer to reject claim 28. Applicant also argues 
the Examiner is inappropriately altering the teachings of Alleva et al to attempt to match the 
reference to the claim limitations. Li response, the Examiner argues, as indicated in the previous 
office action and in the rejection above, Alleva specifically describes a speech recognition 
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processor (as cited above) that produces a textual output that corresponds to the recognized 
portions of the input speech, which provides support for the claimed "speech recognizer." 
Additionally, as indicated in the previous office action and in the rejection above, the text 
normalizer provides support for the claimed "numeric imderstanding processor." The Examiner 
contends that the two specific descriptions of the speech recognizer and the text normalizer 
provide adequate support for the claim limitations of claim 28. 

Regarding claims 17-19, 21-27, 29-34 and 36, in response to applicant's argument that 
there is no suggestion to combine the references, the examiner recognizes that obviousness can 
only be established by combining or modifying the teachings of the prior art to produce the 
claimed invention where there is some teaching, suggestion, or motivation to do so found either 
in the references themselves or in the knowledge generally available to one of ordinary skill in 
the art. See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988) and In re Jones, 958 
F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992). In this case, implementation of acoustic models in 
a speech recognition system was well known to one of ordinary skill in the art of speech signal 
processing to achieve improved accuracy. 

Conclusion 

8. THIS ACTION IS MADE FINAL. AppHcant is reminded of the extension of time 
policy as set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS fi'om the mailing date of this action. In the event a first reply is filed within TWO 
MONTHS of the mailing date of this final action and the advisory action is not mailed imtil after 
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the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 
will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 
CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, 
however, will the statutory period for reply expire later than SIX MONTHS from the mailing 
date of this final action. 

9. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Angela A. Armstrong whose telephone number is 703-308-6258. 
The examiner can normally be reached on Monday-Thursday 7:30-5:00 PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (703) 305-9645. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EEC) at 866-217-9197 (toll-free). 
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